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This paper introduces a nonparametric algorithm for bootstrap- 
ping a stationary random field and proves certain consistency proper- 
ties of the algorithm for the case of mixing random fields. The moti- 
vation for this paper comes from relating a heuristic texture synthesis 
algorithm popular in computer vision to general nonparametric boot- 
strapping of stationary random fields. We give a formal resampling 
scheme for the heuristic texture algorithm and prove that it produces 
a consistent estimate of the joint distribution of pixels in a window 
of certain size under mixing and regularity conditions on the random 
field. The joint distribution of pixels is the quantity of interest here 
because theories of human perception of texture suggest that two 
textures with the same joint distribution of pixel values in a suit- 
ably chosen window will appear similar to a human. Thus we provide 
theoretical justification for an algorithm that has already been very 
successful in practice, and suggest an explanation for its perceptually 
good results. 

1. Introduction. Texture is one of the central concepts in computer vi- 
sion and image analysis. The term is generally used to refer to images of 
repeated patterns with local variations, such as waves, sand or human tis- 
sue. The stochastic nature of texture variations, not necessarily present in 
other real images, makes it a particularly natural area for applying statis- 
tical methods. While many texture algorithms are deterministic and based 
on heuristics rather than probability models, we found that a statistical 
framework can help to understand, justify and improve such algorithms; in 
turn, issues arising in texture algorithms can lead to questions of general 
statistical interest. 

The problem of texture synthesis, which lies at the intersection of com- 
puter vision and computer graphics, is the problem of producing a new 
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texture image which looks hke a given texture, but is not exactly the same. 
It is frequently used in computer graphics to "paint" textures on surfaces, 
and can also be used for image compression, where the whole texture can be 
recreated from a small sample. The point of departure for our research is a 
simple and very popular heuristic resampling algorithm for texture synthesis 
[9] which produces excellent visual results but has no theoretical justification 
or statistical framework. 

We formalize this algorithm in the framework of resampling from random 
fields and prove that it provides consistent estimates of the joint distribution 
of pixels in a window of specified size. The interest in the joint distribution 
of pixels in a window comes from theories of human perception of texture. 
The study of human pre-attentive texture discrimination was pioneered by 
Julesz in the 1960s and 1970s [12, 13, 14]. His original conjecture was that 
textures appear indistinguishable to humans if they have identical first- and 
second-order statistics, and was later extended to higher-order statistics (i.e., 
joint distributions of pairs, triples, etc.). When textures are viewed as ran- 
dom fields on a lattice, they are often assumed stationary and Markovian, 
in which case the distribution of k pixels in the Markov neighborhood de- 
termines the fe-order statistics, and the whole distribution. 

A more modern view of texture perception is that the cells in the visual 
cortex respond to primitive stimuli like bars, edges, dots, and so on, at dif- 
ferent frequencies and orientations. Psychophysical and neurophysiological 
experiments suggest that the brain performs multichannel spatial frequency 
and orientation initial analysis of any image formed on the retina and not 
just texture [6, 11]. These and other similar findings inspired the multichan- 
nel filtering approaches, which use distributions of filter responses for texture 
discrimination, with texture boundaries corresponding to sudden changes in 
the intensity of "firing" of some of the filters. A comprehensive texture per- 
ception model based on this idea was proposed by Malik and Perona [19], 
and many filter-based methods were developed subsequently. This view also 
supports the claim that the joint distribution of k neighboring pixels deter- 
mines texture appearance, since the joint distribution of pixel intensities in 
the filter support window determines the distribution of filter responses. In 
our view, these two interpretations of human perception complement each 
other, and both point to the joint distribution of pixels in a window as the 
key quantity. 

In computer vision, texture synthesis algorithms are ultimately evaluated 
by human visual assessment of synthesized texture. Here we provide a proof 
that, according to theories of human perception, the algorithm of Efros 
and Leung can be expected to produce good visual results. To the best of 
our knowledge, the only other texture synthesis algorithm in the literature 
with a mathematical justification is FRAME [28], but, unfortunately, it does 
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not produce very good visual results in practice, whereas the algorithm 
considered here does. 

This paper converts the Efros and Leung algorithm into a formal boot- 
strap scheme for resampling stationary random fields. The bootstrap tech- 
niques for stationary random fields in the statistical literature are primarily 
used for estimating the mean and the variance of the random field, a goal 
very different from synthesis or estimating the joint distribution. The main 
tool used in this context is the moving block bootstrap (MBB) and its vari- 
ants. MBB was first introduced for time series [15, 18] and extended to 
general random fields by Politis and Romano [21]. It is based on resampling 
blocks independently and concatenating them, rather than resampling by 
conditioning on the neighboring blocks, which is the main difference be- 
tween our bootstrap algorithm and MBB. For time series, bootstrapping by 
conditioning on the past has been introduced by Rajarshi [24] and Papar- 
oditis and Politis [20]; here we extend their methods to stationary random 
fields. 

This paper is organized as follows. In Section 2 we give some background 
on texture synthesis and introduce the algorithm of Efros and Leung [9]. 
In Section 3 we formalize the algorithm in the framework of resampling 
from stationary random fields, and introduce a special case of Markov mesh 
models, which motivate a natural ordering on the plane. In Section 4 we 
show that both the Markov mesh version and the original algorithm produce 
consistent nonpar ametric estimates of the joint distribution of pixels in a 
patch, though the patch sizes differ for the two algorithms. This result is 
proved under the assumptions that the texture is a sample from a stationary 
mixing random field with a smooth density with compact support, and some 
minor regularity conditions. Section 5 concludes with discussion, and the 
Appendix contains all the proofs. 

2. The nonparametric sampling algorithm and previous work in texture 
synthesis. There has been a surge of interest in texture synthesis in the 
past decade, when advances in computing allowed using many computation- 
ally intensive algorithms that could not have been implemented before. The 
many different methods of texture synthesis can be broadly divided into 
three categories. The first and oldest group of methods is model-based, with 
the main modeling tool being Markov random fields (MRF's) [2, 4]. In the 
early MRF work only a few parameters could be fitted because of computa- 
tional difficulties, and those models usually did not capture the complexity 
of real textures. As the number of parameters increases, the synthesized tex- 
tures begin to look more realistic, but it also becomes hard to estimate the 
parameters reliably. 

The other broad category of texture synthesis methods is based on feature 
matching. Typically, these methods start from a white noise image and force 
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it to match some set of statistics of the original texture image, such as distri- 
butions of filter responses [5, 10, 23, 25]. Feature matching methods tend to 
work well on stochastic textures but have difficulties with highly structured 
textures. Another difficulty is that they typically require some number of 
iterations to converge but iterating too many times leads to deterioration of 
the synthesized image. 

There are some methods that use both MRF models and feature matching, 
such as the FRAME model by Zhu, Wu and Mumford [28] . It provides a solid 
theoretical base for combining MRF's with feature matching, and Wu, Zhu 
and Liu [27] showed that FRAME is the natural way to establish equivalence 
between these two approaches. However, its visual results on real textures 
are unfortunately far from perfect. 

A new class of heuristic methods of texture synthesis has been devel- 
oped over the past few years, started by the algorithm of Efros and Le- 
ung [9]. Many variations of their method have been published that speed 
up and optimize the original algorithm in different ways [8, 17, 26]. In all 
these works, however, the basic resampling principle of Efros and Leung [9] 
remains unchanged, and even the original version has been very successful 
on a wider range of textures than any of the previous methods. 

The Efros and Leung algorithm is based on resampling from the random 
field directly, without constructing an explicit model for the distribution. It 
is motivated by an MRF model, that is, by the idea that the value of a given 
pixel only depends on the values of its neighbors, though it is not explicitly 
assumed that the underlying texture distribution is an MRF. 

The algorithm starts with a random "seed" from the original image, typ- 
ically a small square patch, and proceeds to grow the image from the seed 
outward, layer by layer, spiraling around and adding one pixel at a time. To 
synthesize pixel X, one conditions on 0{X), the part of the Markov neigh- 
borhood of X (taken to he w x w square) that has been filled in before X 
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Fig. 1. The nonparametric resampling algorithm. 
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Fig. 2. Some synthesis examples. The original (smaller) texture sizes are 151 x 148, 
54 X 60 and 113 x 110 pixels, respectively. 



(see Figure 1). The conditional distribution of X given 0{X) is never con- 
structed explicitly. Instead, it is resampled directly in the following way: for 
all pixels Yi in the observed image compute the distance d{0{X),0(Yi)), for 
all neighborhoods 0{Yi) of the same size and shape as 0{X). The distance 
is measured by the sum of squared differences between pixel intensities, 
weighted by a Gaussian weight function to emphasize the importance of 
close neighbors. Let 

do = mm d{0{X),0{Yi)) 

i 

be the distance to the best match in the observed image. Define the set of 
"good" matches by 

S = {Y:d{0{X),0{Y))<{l+e)do}. 

Finally, select the value for X uniformly from pixel values in S. 

Here e is a tuning parameter set by Efros and Leung to be e = 0.1 (pre- 
sumably by trial and error), and it is not meant to be changed by the user; 
this value was used in all the Efros and Leung results shown below. 

This algorithm is very simple to implement, and can be used to synthesize 
any size or shape of the desired texture, or fill holes in an existing texture. It 
has worked well on both stochastic and structured textures (see Figure 2 for 
some examples) . Note that highly structured textures require larger window 
sizes than more stochastic textures, and in general, the success of the al- 
gorithm depends on the neighborhood window being big enough to capture 
the local structure correctly, as shown in Figure 3 [9]. The smaller the win- 
dow, the more "stochastic" the synthesized image will appear. This issue is 
discussed further in Section 5; choosing the window automatically is beyond 
the scope of this paper. 

Although the algorithm produced impressive results on a large number 
of various textures, it also produced a few failures, discussed by Efros and 
Leung [9] . It appears to fail when it gets into a part of the search space with 
no good matches; in that case, it starts sampling randomly and produces 
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texture that looks rather like white noise, but the chance of that happening 
is small. For most practical purposes, the algorithm works quite well, and, 
particularly with later computational speed-ups, is the current state of the 
art in texture synthesis. 

3. Formalizing the resampling scheme. In this section we set up a for- 
mal bootstrap scheme along the lines of the synthesis algorithm heuristic. 
Our scheme is an extension to random fields of a p-order Markov bootstrap 
algorithm for time series by Paparoditis and Politis [20], and we use many 
of their techniques in the proofs. Their algorithm is, in turn, an extension of 
a first-order Markov bootstrap of Rajarshi [24]. As in the texture synthesis 
algorithm, the Markov assumption on the original time series is not needed 
in Paparoditis and Politis [20], but the bootstrapped time series reproduce 
the p-order dependence structure accurately. 

For the case of random fields, we will first consider a Markov mesh model 
(MMM), a special case of MRF, which, unlike a general MRF on the plane, 
has a natural notion of the past. 

3.1. The resampling algorithm for Markov mesh models. MMM's (also 
known as Picard random fields) were introduced by Abend, Harley and 
Kanal [1] and have been used for a variety of applications. In particular, 
Popat and Picard [22] used a parametric MMM model for texture synthesis, 
and so did Cressie and Davidson [3]. In both cases, however, results for 
natural textures were of low quality due to the small size of the conditioning 
neighborhood. Fitting all the parameters required for a larger neighborhood 
was computationally infeasible at the time, and the accuracy of estimating 
so many parameters would have been low in any case. 

To define MMM's, let {Xt,t € [l,oo)^} be a real-valued random field. For 
a point t = {ti,t2), define the index set 

Ut = {u: max(l, ti — w + 1) < ui < ti, max(l, t2 — w + 1) < U2 < t2, u ^ t} 

to be a square of size w x w with t as the bottom right corner, t itself 
excluded; notice that for the first w — 1 rows and columns Ut has to be 
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Fig. 3. Synthesis results with different window sizes. The original image is 73 x 71 pixels. 
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truncated. Let 

Wt = {u:l<ui<ti}U{u:l<U2< 

be everything to the left or above t (see Figure 4). Then a Markov mesh 
model assumes 

PiXt\Xw,) = PiXt\Xu,). 

MMM's are a special case of Markov random fields and here the correspond- 
ing Markov neighborhood Nt is a {2w — 1) x (2u) — 1) square centered at t, 
that is, P{Xt\X.t) = P{Xt\XN,). 

If the texture synthesis algorithm is to be motivated by a MMM, the 
natural way to fill in the pixels is to start in the upper left corner and 
proceed in raster order, filling in row by row. Suppose we observe the MMM 
field Xt on the index set [l,Ti] x [1,T2]. Let Ut{s) be the index set Ut shifted 
so that its bottom right corner is s: Ut{s) = (Ut — t + s). For convenience, 
define the p-dimensional vectors = Xu^ and Y((s) = Xij^(^gy Stationarity 
implies that Yt(s) are identically distributed for all t and s. 

There are — 1 possible shapes of Ut (various truncations of the w x w 
square are needed at the boundaries). For each shape consisting of p compo- 
nents {1 <p<w^ — 1), let W^^^ be a kernel on W. The kernel can be scaled 
by a resampling width b, wlf\y) = 6~^VF(^)(y/6), and satisfies some general 
smoothness assumptions we state in Section 4.1. In the synthesis examples 
below, we use the Gaussian kernel ^(^'^(y) = (27r)-P/2 gxp(-||y||2/2). Now 
we have all the components to proceed to 

The MMM resampling algorithm. 

1. Select a starting value for {X^ :ti <w,t2 l^w}, the top left w xw square. 
Typically the starting value will he a w x w square selected from the 
observed field Xt at random. 

2. Suppose X^ has been generated for {t:ti <u}U {t:ti = u, t2 <v}, that 
is, u — 1 rows are filled in completely, and row u is filled up to column 
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Fig. 4. Conditional independence structure in the Markov mesh model. 
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V. To generate the next value = X^^^^ let be a discrete random 
variable with probability mass function 

P^N = s) = ^wl,'\Y;-Yt{s)), 

where Z = J2s — Yf (s)) is a normalizing constant, p = |Y*| is the 

size of the "past" of X^ , and s ranges over all values s such that Ut{s) C 
[l,ri] X [1,T2], that is, all locations where the conditioning neighborhood 
fits within the observed texture field. 
3- Let X(;„)=XAr. 

3.2. Formalizing the general algorithm. The MMM version of the al- 
gorithm contains two modifications of the original algorithm of Efros and 
Leung [9]: the order in which the pixels are filled in the synthesized texture 
(raster instead of spiral), and the weights with which the pixels are resam- 
pled (kernel weights instead of uniform sampling from all matches within e). 
A number of comparisons we give in Section 3.3 show that both versions 
produce reasonable and fairly similar results; however, the spiral ordering of 
the original algorithm tends to have fewer problems with error propagation 
and produces somewhat more visually pleasing pictures. Therefore it is of 
interest to investigate the consistency properties of the spiral algorithm as 
well. 

Here we will think of texture as a stationary random field on rather 
than N^. Let us order all locations t G in the spiral order to ^ ti ^ ■ ■ ■ 
starting at the origin and going around clockwise: 

to = (0,0), ti = (l,0), t2 = (l,-l), t3 = (0,-l), 

t4 = (-l,-l), t5 = (-l,0), t6 = (-l,l), tr = {0,l),.... 

To avoid centering problems, we will only look at conditioning on windows 
with an odd number of pixels along the side of the square, {2w — l) x (2w — l). 
The first pixels (to, ti, • • • , t,n2-i) will be filled in by the seed, say also 
of size m = 2w — 1, and the subsequent pixels will be filled in one by one 
according to the spiral ordering. Apart from the ordering, the resampling 
scheme is exactly the same as for the MMM algorithm. 

For all t G Z^, let Xt be the pixel intensity at location t. Let 

||t||oo = max(|ti|, |t2|) 

be the ^oo norm on the plane. Let 

Ut = {s -.Wt — s\\oo < w, s ^ t} 

be the part of {2w — 1) x {2w — 1) window with t in the center that is filled 
in before t. Let Ut{s) be the index set Ut shifted so that it is centered at 
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s. Finally, let Yj = Xjj^ and Yj(s) = Xij^(^gy The dimension of Ut varies for 
different t, depending on whether it is a corner or a middle pixel, but is 
always between w{w — 1) and 2w{w — 1). So we will need a kernel W^'^'^ 
for each p, w{w — 1) < p < 2w{w — 1). Given the observed texture {Xt :t G 
[l,Ti] X [1,T2]}, the algorithm to synthesize X* can be written as follows: 

The spiral resampling algorithm. 

1. Select a random starting value for {X^ : \\t\\oo < w}, the central {2w — 
1) X {2w — 1) square, uniformly from the observed field Xt- 

2. Suppose X* have been generated for all s ^ t. To generate the next value 
X* , let be a discrete random variable with probability mass function 

P(iV = ,) = lM^W(Y*-Y,(s)), 

where Z = J2s ^b^\^t ~ ^t{s)) is a normalizing constant, p = |Y^| is 
the size of the conditioning neighborhood for X^ , and s ranges over all 
values in [l,ri] x [1,T2] such that Ut{s) € [l,ri] x [1,T2]. 

3. Let X; = Xn. 

3.3. Comparisons with the original algorithm of Efros and Leung and se- 
lection of tuning parameters. In this section we investigate the effects of 
different orderings of the synthesized pixels, different resampling weights 
and tuning parameters. Only selected comparisons are shown here for the 
obvious reasons of space limitations; the conclusions drawn are based on 
a thorough simulation study comparing all variants on a larger number of 
images. For any particular comparison, all other parameters are held fixed 
at their optimal values. 

Figure 5 shows the effects of changing the order, and also compares uni- 
form versus kernel weights. The spiral order of the original Efros and Leung 
algorithm [Figure 5(b)] does appear to produce better results than the MMM 
version [Figure 5(e)], at least for the first texture (for the second mesh tex- 
ture, all results are very similar). However, we claim that the difference is 
mainly due not to the spiral versus raster ordering, but to the fact that the 
spiral conditioning neighborhood contains twice as many close neighbors as 
the MMM "corner" neighborhood. To illustrate, we also generated textures 
in raster order [Figure 5(c), 5(d)] but conditioning on the full half-square 
above t, Ut = {n:max(l,ti — w + 1) < ui < ti,max(l,t2 — w + 1) < U2 < 
t2 + w — l,u ^ t}, a version we will refer to as rectangular (as opposed to 
spiral and corner). This is not a MMM, and it generates results very similar 
to the original algorithm [Figure 5(b), 5(c)]. The remaining slight differences 
are probably due to the fact that spiral places the seed in the middle whereas 
the raster order starts from the corner, so error propagation is worse for a 
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Fig. 5. Comparing different orderings and uniform and kernel weights, (a) Original 
textures (sizes 151 x 138 and 81 x 78 pixels); (h) Efros and Leung result (spiral scheme with 
uniform weights); (c) rectangular scheme with uniform weights; (d) rectangular scheme 
with kernel weights; (e) MMM (comer scheme) with kernel weights. The window sizes are 
w = 27 and w = 23, respectively; the bandwidths are e = 0.1 and b = 0.01. 



raster image of the same size. We also note that the speeded-up version of 
Efros and Leung [26] works in raster order with no problems, and so does 
the patch-based version in [8]. 

Using the uniform versus kernel weights [Figure 5(c), 5(d)] does not pro- 
duce any detectable differences when e and b are carefully chosen. The effect 
of varying b is shown in Figure 6 and, predictably, increasing b leads to the 
synthesized texture looking more "stochastic" and eventually becoming like 
white noise. Increasing e with uniform weights has the same effect. To make 
bandwidths choices more universal, all images are scaled to have grayscale 
values ranging from to 1. 

The effect of the window size has been shown in Figure 3, and remains 
the same for all versions. Other things being equal, larger window sizes tend 
to produce better results; however, they also make the computation costlier 
and reduce the effective sample size of the original image. 

Finally, to keep things in perspective we note that all the different versions 
of the nonparametric resampling scheme [spiral with uniform weights vs. 
corner with kernel weights shown in Figure 7(b), 7(c)] are close to each 
other and quite good when compared to other texture synthesis methods, 
such as De Bonet [5] [Figure 7(d)] and Heeger and Bergen [10] [Figure 7(e)]. 

In the next section, we show that, subject to certain mixing and regularity 
conditions, both MMM and spiral schemes with kernel weights reproduce the 
joint distribution of a pixel in a patch consistently. 
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Fig. 6. Kernel bandwidth effect, (a.) The original texture (Ti x 71 pixels); (h) 6 = 0.007 
(smallest allowed by machine precision); (c) ti = 0.01; (d) 6 = 0.1; (e) 6 = 1. All results 
generated with rectangular scheme, w = 37. 

4. Consistency results. We start by showing consistency of the Markov 
mesh model algorithm. For simplicity of notation, we ignore the side effects 
of truncation and show that the distribution of pixel value given its full 
w-hy-w neighborhood (for t not in the first w rows or columns) converges 
to the truth. We will then show that the same argument applies to truncated 
neighborhoods, and in fact to a neighborhood of an arbitrary shape, as long 
as the resampling scheme is matching it to neighborhoods of the same shape 
in the observed image. Finally, we generalize the consistency results to the 
original spiral ordering of the Efros and Leung algorithm. 

4.1. Assumptions. Let us introduce the following notation: let FY^y) = 
-P(Yt < y) be the cumulative distribution function of Yj and let Fx\Yix\y) = 
P{Xf < x\Yt = y) be the conditional distribution function of given Y^. 
Let /= [if,Ti] X [tf,r2] be the set of all pixels that admit a full conditional 
neighborhood. We assume that the size of the observed texture increases, 
that is, T = min(Ti,r2) — > oo. We also make the following fairly technical 
regularity and mixing assumptions, which, however, are not unreasonable 
for real textures (see the discussion in Section 5). The assumption of com- 




FlG. 7. Nonparametric resampling compared to other methods, (a.) The original texture 
(73 X 71 pixels); (h) Efros and Leung algorithm; (c) MMM with kernel weights (w = 37, 
6 = 0.01 (d) De Bonet algorithm; (e) Heeger and Bergen algorithm. 
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pact support (A2) is automatically satisfied for images since the number of 
grayscale or color values used is finite. 

(Al) The random field Xt is strictly stationary and mixing in the follow- 
ing sense: define mixing coefficients 

ax{k,u,v)=sviv{\P{AB) - P{A)P{B)\: A a{XE),Be aiXp), 

E,FcI, d{E, F) > k, \E\ < u, \F\ < v}, 

where d{E, F) = mi{\\ X — y\\oo '■ X (z E , y (£ F} is the distance between index 
sets E and F. The field Xt is called a-mixing if for all u and v axik, u) — > 
as — > oo. 

We make a more precise assumption about the rate at which the mixing 
coefficients go to 0: there exist e > 0, r > 2 such that for all integers u,v > 
2,u + V < c, where c is the smallest even integer such that c > r, 

oo 

fc=i 

Here d is the dimension of the index set / C Z"', in our case d = 2. 

(A2) Fy and Fx\y have bounded densities with respect to the Lebesgue 
measure, /y and /x|Yi respectively. Moreover, Xt has compact support S, 
and/x|Y(-|y)>0forallye5f. 

(A3) For any yi, y2 G M^, any x G R U {oo}. 



/X rx 
fx,Yiz,yi)dz - fx,Y{z,y2) 
-oo J — oo 



dz 



< L\\yi - y2\ 



where /x,y = /x|y/y- 

(A4) The kernel W on is bounded, first-order Lipschitz continuous, 
symmetric, positive everywhere on W, J uW{u) du = 0, and / ||ii|| VF(n) du < 
oo. When T — > oo, the kernel bandwidth b = 0{T^^), with 6 > chosen so 
that 6< (r-2)/2p(p + l + r). 



4.2. Consistency of the MMM algorithm. Let F^^^-y^{x\y) = P{X^ < x\ 
= y) be the conditional distribution function of the synthesized Xt given 
its neighborhood Y* = y and let F^^ Yti^^y) — P{^t ^ ^j, Y* < y) be the 
joint distribution function of X^ and Y^, that is, the joint distribution of 
pixels m.a,w xw window. This is the distribution of interest because, at least 
for some suitably chosen w, it determines the human perception of texture, 
as discussed in the Introduction. Therefore one may argue that if this joint 
distribution is estimated correctly, then the synthesized texture will appear 
similar to the original. Our main result is the following theorem. 
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Theorem 1. Under assumptions ('Alj-('A4j, the joint distribution of 
and (the joint distribution of pixels in aw xw window) is estimated 
consistently for all t € [w, oo)^; 

(1) sup sup Yt(2;>y) - -^x,Y(2;,y)| ^ a.s.asT^oo. 

We also prove that the resamphng scheme correctly approximates the 
conditional distribution of a pixel given its neighborhood. 

Theorem 2. Under assumptions ( K\)-( KA), the conditional distribu- 
tion of X* given (the distribution of the right bottom corner pixel in a 
w X w window given the other p = w^ — 1 pixels in the window) is estimated 
consistently for all t G [w, oo)^: 

(2) sup sup |FJ ,Yj(a;|y) - i^x|Y(2;|y)| ^ a.s.asT^oo. 

These theorems establish the consistency of the joint distribution of pixels 
in a X tt; window. Inspection of the proof shows that the argument does 
not depend on the shape of Yt. All it requires is that the number of observed 
Yt goes to infinity, so that there are many matches to sample from. It also 
does not depend on the particular order in which the pixels are synthesized, 
because the argument is for a single given pixel in the synthesized texture 
as the size of the observed texture grows. If in the beginning the seed is 
chosen uniformly from the original, we start from a set of pixels whose joint 
distribution is consistent, and add pixels one by one in such a way that 
the joint distribution of the w x w window with that pixel in the bottom 
right corner remains consistent. Thus the joint distribution of every w x w 
window throughout the synthesized texture is estimated consistently. This is 
the main result we were interested in, since it suggests that the synthesized 
texture will appear similar to the original. 

4.3. Consistency of the spiral resampling algorithm. It is clear from the 
proofs of Theorems 1 and 2 that as long as we assume the mixing assumption 
(Al) holds, all kernels satisfy assumption (A4), and for all shapes of 

Y the distributions of X and Y satisfy assumptions (A2) and (A3), we will 
obtain the same consistency result: the conditional distribution of X^ given 
whatever pixels Y^ we conditioned on to fill it in will converge to the true 
distribution of X given Y as the size of the observed texture goes to infinity. 
Similarly, the joint distribution of X^ and Y^ will converge to the truth, 
at every pixel location in the synthesized texture. However, in this case the 
shape of the neighborhood depends on location and is constantly changing 
according to the spiral ordering. Therefore it is not clear whether we can 
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obtain a consistent estimate of the joint distribution in a square window 
over the whole synthesized texture. 

It would be natural to expect that, since we always condition on a {2w — 
1) X (2w — 1) window, or at least on what we can see from it, we will in the 
end get the joint distribution estimate in that window consistently. This was 
the motivation for the Efros and Leung algorithm, but it is in fact not true. 

Consider a simple counterexample for w = 2 shown in Figure 8. The first 
pixel to be filled in after the seed is Xl, and the part of the 3x3 window 
around it that we have so far is S* . We assume that the seed was sampled uni- 
formly from the original image, so the sampling distribution P{Sl) — > P{Si). 
Since the conditional distribution of given is estimated consistently, 
we have 

PiXlS*^)=P{X*\Sl)P{Sl)^P{Xi\Si)P{Si) = P{Xi,Si), 

so the joint distribution of Xf and is estimated consistently. However, 
this tells us nothing about the joint distribution of Xf and 5^. In fact, by 
construction of the synthesis algorithm we have 

P{Xl,Sl,S^) = P{Xl,S^\Sl)P{Sl) = P{Xl\Sl)P{S^\Sl)P{S1) 

^ P{Xi\Si)P{S2\Si)P{Si) = P{Xi\Si)P{Si,S2) 

+ P{Xi\Si,S2)P{Si,S2) = P{Xi,Si, S2). 

In other words, the synthesis algorithm makes Xf and S2 independent 
given SI, a property that the true distribution does not have in general. 
Therefore the estimate of the joint distribution in a {2w — 1) x {2w — 1) win- 
dow cannot be consistent. However, we may still get consistency in a smaller 
window, and in fact this example suggests that in order to get consistency 
for a window of size w x w, one must condition on a bigger window that 
contains all w x w windows or their parts which cover the pixel being syn- 
thesized. The size of this bigger window must be exactly {2w — 1) x {2w — 1) 
in order to cover all w x w windows containing the pixel at its center. Then 
each added pixel will fit in correctly with all w x w windows that contain 



SEED 




X, 


X. 











Fig. 8. Counterexample to consistency of the original algorithm: in the bootstrapped ver- 
sion, Xl and S2 are independent given Si . 
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it, and by induction the joint distribution in sdl w x w windows throughout 
the synthesized texture wih be estimated consistently. 
To formahze this claim, let 

Vt = {s: \\t - s\\oo <w,s^t} 

be the {2w — 1) x (2w — 1) window centered at t, and let = Xy^ be the 
pixel intensities in that window. Let Qt be the pixels in the w x w window 
located at [t — w + and let Qt = Xq^ be the corresponding vector of 

pixel intensities. Let 

F4(q)=P(Q*<q) 

be the cumulative distribution function of Qt in the texture synthesized by 
the spiral-order algorithm, and let Fq be the true cumulative distribution 
function of the w x w window in the original texture. 

Theorem 3. Suppose we observe {Xt:t e [l,Ti] x [1,T2]}. Let T = 
min(Ti,T2). If the field Xt satisfies assumption (Al), the distributions of 
X , V, X|V and (X, V) satisfy assumptions ( A2j and ( A3/, and the kernels 
\Y^p) for all p = w{w — 1), ... , 2w{w — 1) satisfy assumption ( A4J, then the 
distribution of Qt is estimated consistently for all t £1? : 

(3) sup |F4(q)-FQ(q)H0 a.s. as T ^ oo. 

This shows that the original algorithm of Efros and Leung also provides 
consistent estimates of the joint distribution in aw x w window, which may 
be an explanation for its perceptually good results, although the window 
is smaller than what the authors intended. One can similarly show that 
conditioning on a rectangular upper half- window when synthesizing in raster 
order (what we called the "rectangular scheme" in experiments) produces a 
consistent estimate of the distribution in a fi; x u; window, but not in the 
full {2w — 1) X {2w — 1) window. 

5. Discussion. The main contributions of this paper are the formal prob- 
abilistic framework for the nonparametric sampling algorithm of Efros and 
Leung [9] and the proof of its consistency. In particular, the fact that the 
joint distribution of pixels in a window of specified size is estimated con- 
sistently may explain the perceptually good results of the algorithm. This 
joint distribution is important for texture perception both from the Julesz 
school's point of view (fcth-order statistics) and from the multichannel fre- 
quency analysis perspective, since the joint distribution of pixels in the fil- 
ters' support window determines the joint distribution of filter responses. 

The proof of consistency requires a number of conditions which may look 
complicated, but are in fact perfectly plausible for most real textures. The 
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mixing assumption will not hold for purely periodic patterns, but as long as 
there is some stochastic variation, it becomes a natural description of tex- 
ture. As for the density assumptions, the grayscale pixel values are discrete, 
but there are sufficiently many of them to make the smoothness conditions 
plausible. The assumption of compact support is automatically satisfied for 
images since the number of grayscale or color values used is always finite, 
but may be a more substantial limitation for applications to other random 
fields. 

Our goal here was to provide asymptotic justification of the basic nonpar a- 
metric resampling principle rather than of any particular implementation of 
it. For instance, the resampling weights we use are slightly different from the 
ones originally used by Efros and Leung [9] . We feel that modifying the proof 
to accommodate details of a particular implementation, either the original 
or one of the several follow-up versions, is possible but unnecessary, since 
this is intended as a justification of their common underlying principle. 

One issue that we did not address is determining the correct window size 
to use in the resampling algorithm in order to obtain perceptually good 
results. The asymptotics guarantee only that the distribution in a certain 
size window is estimated consistently. They say nothing about whether the 
distribution of the whole synthesized texture is consistent, unless we are 
willing to assume that the true distribution is a Markov random field with 
the neighborhood of the same or smaller size as our chosen window. Ex- 
perimentally it appears that the window big enough to contain the largest 
texture "element" (determined by the user) works for the resampling algo- 
rithm. Automatically determining the correct window size for nonparametric 
resampling algorithms, and the scale of a given texture in general, is an open 
problem. In classic MBB, the size of the block can be chosen to optimize 
the bias and variance of the estimator of the mean; this cannot be applied 
here as the goal is to reproduce the joint distribution of pixels rather than 
to estimate the mean. One could use cross-validation, that is, compare the 
synthesized texture to the original for several window sizes using a texture 
similarity measure (see, e.g., [16]), and pick the window size that maximizes 
this similarity. This approach is somewhat computationally expensive, and 
there is no guarantee that the similarity measures used for classification and 
segmentation will be adequate for human perception. 

Another tuning parameter set by the user is the kernel bandwidth b or 
e. In the algorithm implementations they were determined empirically and 
held constant for all the textures, so it only needed to be done once. Meth- 
ods for bandwidth selection used in density estimation could be applied here, 
although one does not expect drastic practical improvements. Another pos- 
sibility is to select both the window size and the kernel bandwidth by cross- 
validation, which may yet become preferred over the standard computer 
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vision practice of user-selected parameters as computing becomes faster and 
similarity measures get fine-tuned to mimic human perception more closely. 

A natural question to ask is what else this algorithm could be useful for 
beyond texture synthesis. We believe it will do well for the usual bootstrap 
task of estimating the mean and variance of a random field; establishing 
its rates of convergence and comparing them to, for example, MBB could 
be a direction for future work. A drawback from the computer vision point 
of view is that this type of bootstrap cannot be used to perform texture 
classification or recognition, since no generative model is fit to the data. It 
can, however, be used for estimating various texture parameters, such as the 
texture scale, via cross-validation as described above; these parameters can 
in turn be useful for classification and other higher-level tasks. 



APPENDIX: PROOFS 

Everywhere in the proofs we suppress the dependence on t in y and 
^XtlYt avoid clutter, and remind the reader all theorems hold for all 
appropriate t Gl?. Before proceeding to the proofs of our results, we state a 
moment inequality for mixing random fields which we will need below. The 
proof of this inequality and many other useful ones can be found in [7]. 



Lemma A.l (Moment inequality). Let Ft be a real-valued random field 
indexed hy I dll^ satisfying conditions (Al). If EFt = 0, Ft G L"^"*"^ and 
T>2, then there is a constant C depending only on t and mixing coefficients 
of Ft such that 



E 



tei 



<Cmax(L(T,e),L(2,e)^/2) 



where 



L{fi,e)=Y,{E\Ftr+T/^^+'^ = Y,\\Ft 
tei tei 



We will start from the proof of Theorem 2 (consistency of conditional 
distributions for the MMM algorithm). Note that, for any x € M, y G S^, 

n|Y(^|y) = E l(-oo,x](^t)VFb(y -Yt)/Y. Whiy - Y,), 



tei 



sei 



Fx\Yix\y) = j '\-{-oo,x]{z)fx\Y{z\y) dz. 



We first prove the following unconditional result. 



18 E. LEVINA AND P. J. BICKEL 

Lemma A. 2. Under assumptions ( Al )-( KA), for any x G ^ 



sup 



\,^^{-oo,x]i^tWb{y - ^t) - f i{-oo,x]{z)fx,Y{z,y)dz 
tal 



a.s. 



From this lemma, we can immediately get a useful corollary. Let /-yly) 
T Ste/ Wb{y — Yj). Then setting x = oo in Lemma A. 2 we get: 

Corollary A.l. Under the conditions of Theorem 2, 

sup |/^(y)-/Y(y)HO a.s. 

yeSP 

Proof of Lemma A. 2. Let us introduce the notation 
4(y) = ^J2 l(-oo,x] {Xt)Wb{y - Yt), 

r{y)= j l^^oo,x]{z)fx,Y{z,y)dz. 
The lemma will be proved by showing 



(4) 

and 
(5) 



sup \Er^{y) — r(y)| a.s. 



sup |r^(y) - Er^{y) \ a.s. 
y&SP 



First let us compute 

Er*T{y)= r I fxr^{z,M)Wb{y-Vi)dMdz 

J~oo JueRP 

u-y 



oo JueRP 



fx,Y{z,n)W 



du dz 



= r f fx,Y{^M + y)W{Y)dYdz, 
J-oo Jv£RP 

where v = (u — y)/b. Also note that, since is a kernel, 

fx,Y{z,y)dz= / / fx,Y{z,y)W{Y)dvdz 

-oo J-oo JweRP 



Now (4) becomes 

sup \Er^{y) - r(y) 
y&SP 



■■ sup 

yeSP 



r I W{w){fx,Y{zM + y)- fx,Y{z,y))d^dz 

J-oo JveRP 



< 
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sup 

yeSPJvGRP 



<bLj ||v||VF(v)dv = 0(6) = 0(r-^) ^0 



3j.S. , 



where the last inequahty follows from assumption (A3). 
Equation (5) is the main part of the proof. Define 

Zf,T(y) = l(^ooMi^t)Wb{y - Yt) - £;i(„oo,.](^t)^b(y - 
Note that EZt.T = 0, and claim (5) is that 



sup 



a.s. 



Recall that Xt has compact support S. Therefore we can cover with 
Nt cubes li^T with centers yi and sides Lt- Then 



sup 

yesp 



tei 



max sup 

l<i<NT y^SPnli._T 



tei 



< max 

l<i<ArT 



7f;Y. ^t'Tiyi) 



+ max sup 

l<i<NT y(ZSPnIi^T 

= 1 + 11. 

First let us deal with term II: 



;Y.i^t,T{y) - Zt^riyi)) 



II < max sup — 

l<j<7VTygSPn/,,T ^ 



J2miy-Yt)-Wt{yi-Yt 



Lie/ 



+ E\Wh{y - Yt) - Wb{y^ - Yt 



< Ci max sup b ^ 

l<i<NT y(ZSPnIi,T 



y-yi 



< Cib~P-^LT. 

The last line follows from the Lipschitz assumption on the kernel (A4). 
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If we let the side of the cubes Lt = ebP+'^ = 0{T~^^p+'^^), then term II 
is bounded above by e. Note that the number of cubes Nt = 0{\/L^) = 

We will use the Borel-Cantelli lemma to show that term I goes to 0. By 
elementary inequalities, 



PT(I>e 



Nt / 



> e 



(6) 



< Nt max P 

l<i<NT 



< Nt max 

l<i<NT 



1 

T 



t€l 

E\EtZt,T{y^W 



> e 



To bound the last term, we apply the moment inequality of Lemma A.l 
to random variables. 



Ft = Z, 



{Xt)Wb{y - bYt) - El 



(— oo,a:] 



{Xt)Wi,{y-bYt), 



so we need to check that they satisfy assumption (Al). First note that 

Ft = f{XuYt) = f{X^t^.^+^^t]^). 

Since the definition of mixing coefficients only depends on u-algebras 
generated by Ft, we may instead consider larger cj-algebras generated by 
(Xj,Yj) — the mixing coefficients of Ft can only be smaller than those of 
(Xj,Yj). Therefore, 

apik, u, v) < snp{\P{AB) - P{A)P{B) \:Aea{{X, Y)e),B G a{{X, Y)f), 

E,F Cl,d{E,F) >k,\E\ < n, |F| <v}. 

Now notice that {X, Y)e = Xe' where E' = {t + a e F.t e E,a e [-w, 0] x 
[—w, 0]}, that is, all points in E and everything in a w-hy-w square to the left 
and above them. Similarly, let F' = {t + a € F.t £ F,a G [—w, 0] x [—w, 0]}. 
If \E\ <u and |F| < f , then < up'u and j-F'| < nP'v. Also, if the distance 
between E and F, d{E, F) > k, then d{E' , F') >k — w. Therefore, 

apik, u, v) < ax{k — w, w'^u, w^v), 

and assumptions (Al) are clearly satisfied for the field Ft, possibly with 
different constants. 

Now we can apply Lemma A.l to identically distributed mixing variables 
Zt,T- Note that since the kernel W is bounded, \Zt,T{yi) \ < Mb'P < MT^p. 
Also note 

L(T,e) = T{E\Zt,Tr+T^''^^'^ < TiMT^Py = M^T^p^+^ 
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and 

and since we take r > 2 



E 



t€l 



/2 



Now we can go back to (6) and plug in the moment bound. We get 

PTil>e)<NT max g(r)r^P"+"/^ ^ Q^r^Spip+i)+Spr~r/2y 
^ ^ - ^ i<i<NT e^r^ ^ ^ 

Since we assumed r > 2 and 6 < {t — 2)/2p{p + 1 + r), we can always 
choose 6 such that 6p{p + 1) + r(5p — 1/2) < —1. Therefore ^tPt{I > e) < 
oo and by the Borel-Cantelli lemma 1^0 a.s. This concludes the proof of 
Lemma A. 2. □ 

Proof of Theorem 2. Note that 

Eie/l(-oo,x](^t)W^6(y-Yi) _ 4(y) 



^xiY(^ly) 



Et6/VF6(y-Yi) /^(y) 



and that r(y) = -Fx|Y(^|y)/Y(y)- Then the expression under sup in (2) 
becomes 

|F^|Y(x|y)-F^|Y(^|y)| 
1 



/^(y) 



4(y) - Ky) + Ky) - i^x|Y(^|y)/Y(y)l 



< 7r^(I^T(y) - ^(y)l + i^x|Y(^|y)l/Y(y) - h{y)\). 

From Lemma A. 2, Corollary A.l and assumption (A2) it follows that 
sup \Fx\YiAy) - -^x|Y(a;|y)| a.s. 

To establish the uniform convergence over all x E M, we use the argument of 
the Glivenko-Cantelli theorem: for each x, F'^y^{x\y) — >i^x|Y(^|y) a-s- by 
the ergodic theorem. Since F'^^-y are nondecreasing, and Fx\y is bounded 
and continuous, it follows that convergence is uniform over all x G M. □ 



FlY{^,y) = P{X;<x,Y;<y)= f f flY{z,n)dzdn 

J u<y J z<x 

= I I rx\Yi^\u)f^iu)dzdu= f Fj^|Y(xju)/^(u)du. 

Ju<yJz<x Ju<y 
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Similarly, 



Fx,Yix,y)= Fx|Y(a:|u)/Y(u)du. 

Ju<y 



Recall that we assumed /y is bounded and has compact support S^. Then 
we have the bound (indexes are omitted for clarity) 

\F*{x,y)-F{x,y)\ 

<f F*(x|u)|r(u)-/(u)|du+ / |F*(x|u)-F(x|u)|/(u)du 

<1- J \f*{u)- f{u)\du + M J jF*(x|u) -F(x|u)|(iu 
<\SP\ sup |/*(u)-/(u)| + M|5P|sup|F*(x|u)-F(x|u)|. 

UG5P u 

Taking the supremum over x and y, we get 
sup sup \F*{x,y) - F{x,y)\ 

<Ci sup |r(y)-/(y)| + C2Supsup \F* {x\y) - F{x\y)\ ^ a.s. 

by Theorem 2 and Corollary A.l. □ 

Proof of Theorem 3. Let Yt = {Xs -.s&Vt^s^t} be the vector of 
pixel values in Vt that come before Xt in the spiral ordering. The first thing 
we need to verify is that if X and V [the full {2w — 1) x (2w — 1) window 
around X] satisfy assumptions (A2) and (A3), so will X and Y. 

Assumption (A2) says that V and X|V have bounded, positive every- 
where densities with compact support. This will obviously hold for Y and 
X|Y as well. 

Now let us verify that assumption (A3) holds for X and Y. Write V = 
(Y,U), where U is the part of V that comes after X . Then we can write 

fx,Y{x,y) = J fx,Y,ijix,y,u) du. 

Let vi = (yi,u) and V2 = (y2,u). Then 



fx,Y{r,yi)dr 



fx,Y{r, y2)dr 



r rx rx 

< / fx,Y,uir,yi,u)dr - / /x,Y,u(r, y2, u) dr 



du 



(A3) 



< |5|'i'-(")L|Ivi-V2||<L||yi-y2| 
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Now we can use Theorem 1 to conclude that 

(7) supsup|FJ Y(^>y) ~ -^A:,Y(2;)y)| ^ a.s. as T — > oo. 

X y ' 

The consistency of the distribution estimate m a w x w window can now 
be proved by induction. Let Qt be the set of all w x w windows and parts 
of such windows that are filled in by the time the pixel at location t is 
synthesized. The induction hypothesis is that, for all t € Z^, if Q € Qt and 
Q = Xq, then 

(8) sup|F^(q) -FQ(q)| ^0 a.s. as T ^ oo. 

q 

(1) For the first (2?i; — 1)^ locations tj, i = 0, . . . ,'iw{w — 1), the windows 
in Qt are sampled uniformly from the observed texture, and therefore Fq 
is the empirical distribution function of the corresponding window in the 
texture sample. The claim (8) is true by the Glivenko-Cantelli theorem. 

(2) Suppose (8) holds for all s ^t. For all Q G Qt that do not include t, 
the claim holds since these sets also belong to Qs for some s ~<t. For sets 
Q that include t, we can write Q* = {X* , S*), where S* is the vector of all 
pixel intensities in Q other than . Since the size of Q is at most w x w, 
by construction of our conditioning neighborhood all the pixels of S* are 
included in Y^. By (7), the joint distribution of X^ and converges to 
the truth, and therefore so does the joint distribution of X^ and . This 
establishes the induction hypothesis for all Q € Qt. □ 
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